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Abstract 



Working in Winfree's abstract tile assembly model, we show that a constant-size tile assembly 
system can be programmed through relative tile concentrations to build an n x n square with 
, high probability, for any sufBciently large n. This answers an open question of Kao and Schweller 

{Randomized Self- Assembly for Approximate Shapes, ICALP 2008), who showed how to build 
an approximately n x n square using tile concentration programming, and asked whether the 
approximation could be made exact with high probability. We show how this technique can be 
modified to answer another question of Kao and Schweller, by showing that a constant-size tile 
assembly system can be programmed through tile concentrations to assemble arbitrary finite 
scaled shapes, which are shapes modified by replacing each point with a c x c block of points, for 
' some integer c. Furthermore, we exhibit a smooth tradeoff between specifying bits of n via tile 

, concentrations versus specifying them via hard-coded tile types, which allows tile concentration 

OO ' programming to be employed for specifying a fraction of the bits of "input" to a tile assembly 

system, under the constraint that concentrations can only be specified to a limited precision. 
Finally, to account for some unrealistic aspects of the tile concentration programming model, 
I we show how to modify the construction to use only concentrations that are arbitrarily close to 

' uniform. 

> ■ 

■ 1 Introduction 

H 

I Self-assembly is a term used to describe systems in which a small number of simple components, 

each following local rules governing their interaction with each other, automatically assemble to 
form a target structure. Winfree [27] introduced the abstract Tile Assembly Model (aTAM) ~ based 
on a constructive version of Wang tiling [25,26] - as a simplified mathematical model of Seeman's 
work [20] in utilizing DNA to physically implement self-assembly at the molecular level. In the 
aTAM, the fundamental components are un-rotatable, but translatable square "tile types" whose 
sides are labeled with glue "labels" and "strengths." Two tiles placed next to each other interact if 
the glue labels on their abutting sides match, and a tile binds to an assembly if the total strength 
on all of its interacting sides exceeds the ambient "temperature," equal to 2 in this paper. The 
model is detailed more formally in Section 2. 



*A preliminary version of this article appeared as [9]. 
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Winfree [27] demonstrated the computational universality of the aTAM by showing how to 
simulate an arbitrary cellular automaton with a tile assembly system. Building on these connections 
to computability, Rothemund and Winfree [17] investigated the minimum number of tile types 
needed to uniquely assemble an n x n square. Utilizing the theory of Kolmogorov complexity, 
they show that for any algorithmically random n, ^ logiogn ) types are required to uniquely 
assemble an n x n square, and Adleman, Cheng, Goel, and Huang [1] exhibit a construction showing 
that this lower bound is asymptotically tight. 

Real-life implementations of the aTAM involve (at the present time) creating tile types out of 
DNA double-crossover molecules [18], copies of which can be created at an exponential rate using 
the polymerase chain reaction (PGR) [19]. PGR technology has advanced to the point where it 
is automated by machines, meaning that copies of tiles are easy to supply, whereas the number 
of distinct tile types is a precious resource, costing much more lab time to create. Therefore, 
effort has been put towards developing methods of "programming" tile sets through methods other 
than hard-coding the desired behavior into the tile types. Such methods include temperature 
programming [10, 24], which involves changing the ambient temperature through the assembly 
process in order to alter which bonds are possible to break or create, and staged assembly [8], 
which involves preparing different assemblies in different test tubes, which are then mixed after 
reaching a terminal state. Each of these models allows a single tile set to be reused for assembling 
different structures by programming different environmental conditions that affect the behavior of 
the tiles and therefore serve as an "input" to be processed by the tile set. 

The "input specification model" used in this paper is known as tile concentration programming. 
If the tile assembly system is nondeterministic - if intermediate assemblies exist in which more than 
one tile type is capable of binding to the same position - and if the solution is well-mixed, then the 
relative concentrations of these tile types determine the probability that each tile type will be the 
one to bind. Tile concentrations affect the expected time before an assembly is completed (such 
a model is considered in [1] and [2], for instance), but we ignore such running time considerations 
in the present paper. We instead focus on using the biased randomness of tile concentrations to 
guide a probabilistic shape-building algorithm, subject a certain kind of "geometric space bound"; 
namely, that the algorithm must be executed within the confines of the shape being assembled. 
This restriction follows from the monotone nature of the aTAM: once a tile attaches to an assembly, 
it never detaches. 

We now describe related work. Ghandran, Gopalkrishnan, and Reif [5] show that a one- 
dimensional line of expected length n can be assembled using 6(logn) tile types, subject to the 
restriction that all tile concentrations are equal. Furthermore, they show that this bound is tight for 
all n. Note that this is not tile concentration programming since the concentrations are forced to be 
equal. Nonetheless, they use the inherent randomness of binding competition to strictly improve 
the assembly capabilities of the aTAM; a simple pigeonhole argument shows that n unique tile 
types are required to construct a line of length n in the deterministic aTAM model. Two previous 
papers [2,11] deal directly with the tile concentration programming model. Becker, Rapaport, and 
Remila [2] show that there is a single tile assembly system T such that, for all n G N, setting the 
tile concentrations appropriately causes T to assemble an n' x n' square, such that n' has expected 
value n. However, n' will have a large deviation from n with non-negligible probability. Kao and 
Schweller [11] improve this result by constructing, for each (5, e > 0, a tile assembly system T such 
that setting the tile concentrations appropriately causes T to assemble an n' x n' square, where 
(1 — e)n < n' < (1 + e)n with probability at least 1 — 6, for sufficiently large n G Z+ (depending on 
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S and e). 

Kao and Schweller asked whether a constant-sized tile assembly system could be constructed 
that, through tile concentration programming, would assemble a square of dimensions exactly nxn, 
with high probability. We answer this question affirmatively, showing that, for each 6 > 0, there 
is a tile assembly system T such that, for sufficiently large n G Z"*", there is an assignment of 
tile concentrations to T such that T assembles an n x n square with probability at least 1 — 5. 
Therefore, with a constant number of tile types, any size square can be created entirely through the 
programming of tile concentrations. The primary technique is a tile set that, through appropriate 
tile concentration programming, forms a thin structure of height O(logn) and length 0(n^/^) (and 
for arbitrarily small e > 0, the length can be made 0{n'')) and encodes the value of n in binary. 
This binary string could be used to assemble useful structures other than squares, such as rectangles 
and other supersets of the sampling structure that are "easily encoded" in a binary string of length 
O(logn). 

Kao and Schweller also asked whether arbitrary finite connected shapes, possibly scaled by 
factor c G N (depending on the shape) by replacing each point in the shape with a c x c block 
of points, could be assembled from a constant tile set through concentration programming. Our 
construction answering the first question computes the binary expansion of n with high probability 
in a self-assembled rectangle of height O(logn) and width 0(n^/^). By assembling this structure 
within the "seed block" of the construction of [23], our construction can easily be combined with 
that of [23] to answer this question affirmatively as well, by replacing the number n with a program 
that outputs a list of points in the shape, and using this as the "seed block" of the construction 
of [23]. 

Since it may be infeasible to specify tile concentrations with unlimited precision, we show how 
to generalize our construction to allow a smooth tradeoff between specifying the number n through 
tile concentrations versus hard-coded tile types. Since log n bits are required to specify n for almost 
all values of n, we show that for arbitrary g, it is possible to specify "about" g of the bits through 
tile concentrations and the remaining "about" log n — g bits through the hard-coding of tile types; 
i.e., using a tile set that can be described with about (logn) — g + o(logn) bits. The actual bound 
is complicated and is stated in Theorem 5.3. 

Finally, there are some unrealistic aspects of the concentration programming model, in addition 
to the assumption that concentrations can be specified to unlimited precision. Chiefly, the aTAM is 
itself an kinetically implausible model, but Winfree showed that the behavior of the aTAM can be 
approximated to arbitrary accuracy by the more realistic kinetic Tile Assembly Model (kTAM) [27]. 
One of the assumptions Winfree employs to achieve this approximation is that all tile types have 
equal concentration, a condition clearly violated by our intentional setting of concentrations to 
be unequal. We will argue that our particular construction avoids the potential pitfalls of the 
concentration programming model, but leave open the task of defining a concentration programming 
model that is inherently immune to these pitfalls. We also show how to alter the construction to use 
only concentrations that are arbitrarily close to uniform, as a potential fix for the kinetic problems. 

This paper is organized as follows. Section 2 provides background definitions and notation 
for the abstract TAM and tile concentration programming. Section 3 specifies and proves the 
correctness of the main constiTiction, a tile set that can be used to assemble precisely-sized squares 
through concentration programming. This results of Section 3 first appeared in [9]. Section 4 
specifies the construction of a tile set that assembles scaled versions of arbitrary finite shapes 
through concentration programming. This scaled shapes construction was announced in [9] but not 
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demonstrated. Section 5 discusses a relaxation of the model that allows a greater than constant 
number of tile types, while using fewer bits to specify the concentrations of each tile type, and 
shows how to achieve a smooth tradeoff between the resources of "number of tile types" and "bits 
of precision of concentrations" , to assemble squares, while using an asymptotically optimal number 
of total bits of precision (i.e., the bits of precision of concentrations, plus the bits needed to describe 
the tile types, are O(logn) for an n x n square). Section 6 discusses some unrealistic aspects of the 
concentration programming model, and argues that the constructions of this paper are resistant to 
the problems caused by those unrealistic assumptions, or can be fixed to alleviate these problems. 
Section 7 concludes the paper, discusses practical limitations of the construction and potential 
improvements and states open questions. 

2 The Tile Assembly Model and Tile Concentration Programming 

We give a brief sketch of the Tile Assembly Model. More details and discussion may be found 
in [12,16,17,27]. Our notation is that of [12], which provides a more detailed and self-contained 
introduction to the Tile Assembly Model for the reader unfamiliar with the model. 

All logarithms in this paper are base 2. We work in the 2-dimensional discrete space 1?. Define 
the set U2 = {(0, 1), (1,0), (0, —1), (—1,0)} to be the set of all unit vectors, i.e., vectors of length 1 
in T?. We write [X]^ for the set of all 2-element subsets of a set X. All graphs in this paper are 
undirected graphs, i.e., ordered pairs G = {V,E), where V is the set of vertices and E C [T/]^ is 
the set of edges. 

Intuitively, a tile type t is a unit square that can be translated, but not rotated, having a well- 
defined "side -u" for each u ^ U2- Each side n of t has a "glue" with "label" labels (ii) - a string 
over some fixed alphabet S - and "strength" strj (H) - a nonnegative integer - specified by its type 
t. Two tiles t and t' that are placed at the points a and a + u respectively, bind with strength 
strj {u) if and only if (labels (n) ,strt (u)) = (labelj/ {—u) jStr^/ (— u)). In our figures, we follow the 
convention of representing strength-0 bonds with dashed lines, strength-1 bonds with single lines, 
and strength-2 bonds with double lines. 

Given a set T of tile types, an assembly is a partial function a : 1? — ^ T, with points x ^1? 
at which a{x) is undefined interpreted to be empty space, so that dom a is the set of points with 
tiles. We write \a\ to denote |dom a|, and we say a is finite if |a| is finite. For assemblies a and /3, 
we say that a is a subassembly of j3, and write a C /3, if dom a C dom j3 and a{x) = f}{x) for all 
X G dom a. ^ is a single-tile extension of a if a C ;9 and dom ^ \ dom a is a singleton set. In this 
case, we write ^ = a-\- {•m^t), where {m} = dom P \ dom a and t = P{rn). 

A grid graph is a graph G = {V, E) in which V ^ 1? and every edge {a, 6} G has the 
property that a — b G U2- The binding graph of an assembly a is the grid graph Ga = iV^E), 
where V = dom a, and {m,n} E E if and only if (1) m — n G U2, and (2) a{m) and a{fi) bind 
with positive strength. An assembly is T-stable, where r G N, if it cannot be broken up into smaller 
assemblies without breaking bonds of total strength at least r; i.e., if every cut of Ga has weight 
at least r, where the weight of an edge is the strength of the glue it represents. In contrast to 
the model of Wang tiling, the nonnegativity of the strength function implies that glue mismatches 
between adjacent tiles do not prevent a tile from binding to an assembly, so long as sufficient 
binding strength is received from the sides of the tile at which the glues match. The frontier of an 
assembly a is 5a = (J^gy { in \ m ^ dom a and a + {rh 1) is r-stable }, the set of locations at 
which a single tile could be stably added to a. 
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Self-assembly begins with a seed assembly a (typically assumed to be finite and r-stable) and 
proceeds asynchronously and nondeterministically/ with tiles adsorbing one at a time to the existing 
assembly in any manner that preserves stability at all times, formally modeled as follows. 

A tile assembly system (TAS) is an ordered triple T = {T,a,T), where T is a finite set of tile 
types, a : — ^ T is the finite, r-stable seed assembly, and r G N is the temperature, equal to 2 
in this paper. ^ T is singly-seed if |dom a\ = 1. An assembly sequence of a TAS T = {T,a,2) is 
a (finite or countably infinite) sequence a = (aj | < i < A:) (with /c G N U {oo}) of assemblies 
in which ckq = cr, each a^+i is a single-tile extension of ccj, and each ai is r-stable. The result 
of a is the unique assembly res(a) such that doni res(a) = IJ*=o^ dom and, for all < i < fc, 
ai III res(a). In the case that k is finite, it is routine to verify that res(a) = ak-i- We write AIT] 
to denote the set of all results of assembly sequences of T starting with the seed assembly, known 
as the producible assemblies of T. An assembly a is terminal if no tile can be stably added to 
it; i.e., if da = 0. If a is producible and terminal, we write a G ^n[7^- An assembly sequence 
a = (ai I < i < fc) is fair if for all i and all rn G dai, there exists j such that ajlm) is defined; 
i.e., no frontier location is "starved". It is routine to verify that a is fair if and only if res(a) is 
terminal. 

A tile concentration assignment on T is a function p : T [0,oo).^ If p{t) is not specified 
explicitly for some t ^T, then p{t) = 1. p induces a probability measure Pp : ^n[T] [0,1] 
in the following way. Let a G ^□[7^ be a producible, terminal assembly. Let A(a) be the 
set of all assembly sequences a = {ai \ Q < i < k) such that res(a) = a. Write Ta^{m,) = 
{ i G T I Q!j + (m I-)- 1) is r-stable } for the set of tile types t that are stably attachable at position 
rh G dai- Let faiirh) = ^ p{t). Define the frontier selection probability 

t&Tci (m) 

m) 



Pa,{m)= 



This quantity is the probability that m is the position of next attachment to a^. Let t G Tq. (m) 
and define the tile selection probability 



This quantity is the conditional probability that t attaches to position m of a^, given that m G da^ 
is the frontier location that is tiled at stage i of assembly. Define m^^j G dai to be the frontier 
location that is tiled in ai to create a^+i, and let ts,i = Q;i+i(m) be the tile type placed there. We 



^There are multiple senses in which a tile system can be nondeterministic. One sense is that the location of attach- 
ment, if there is more than one candidate, is selected nondeterministically. Such systems may still be deterministic 
in the sense that they will lead to a unique final assembly. We employ a stronger version of nondeterminism in which 
the tile capable of binding to a single position of an assembly is not fixed; the randomized algorithm we implement 
relies on this choice being made according to the tile concentrations. 

tile set can be "programmed" with different inputs through selection of an appropriate seed assembly. In this 
paper, we wish to model the situation in which, once work has been done once to create a single tile set, the tile 
set can be programmed entirely through adjustment of tile concentrations. Hence, our result is stated in terms of 
the existence of a tile assembly system, with a fixcxi scxxi assembly (in fact, a single seed tile), that can be used to 
construct squares of any size, solely by adjusting the tile concentrations. 

^Note in particular that we do not require p to be a probability measure on T. 
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define the probability measure Pp : An [7] — )■ [0, 1] as 

fe-2 

= 5Z ]XPai{rns,i)Pai{ta,i\'rns,i)- 

a=iai\0<i<k)eA{a) i=0 

By the identity Pr{A and B) = Pr(^)Pr(S|^), the quantity Pai{'fna,i)Pai{ta,i\'^a,i) is the proba- 
bihty that j is the next location of attachment and that j is the tile type to be placed there. 

For an event E, write Pr p„ IE] to denote the probability that E happens when a is sampled 

a^An[T] 
according to distribution Pp. 

For A,BCZ'^ and u G Z^, we write A + u to denote the set { v + u \ v £ A }, and we write 

^4 ~ i? if there exists u such that A+u = B; i.e., if A is a translation of B. For p £ [0, 1] and X C Z^, 

we say X strictly self- assembles in T{p) with probability at least p if Pr p [dom a 2± X] > p. 

ai — An[T] 

That is, T self-assembles into the same shape as X with probability at least p. Note that two 
different assemblies may have the same shape though they might assign different tile types to the 
same position. 

The definition of Pp takes into account not only the tile selection probability, the effect of tile 
concentrations on which tile type is selected when more than one compete to bind to a single frontier 
location, but also the frontier selection probability, which of multiple frontier locations is selected. 
However, all constructions in this paper are correct so long as the assembly sequence is fair. By the 
following observation, the assembly sequence can be assumed fair so long as all concentrations are 
strictly positive, implying that we need not consider the frontier selection probability when arguing 
the correctness of the constructions. Obviously, there are finite assembly sequences a occurring 
with positive probability such that rcs(a) is not terminal. However, we want to establish that as 
long as growth is allowed to continue whenever the assembly is nonterminal, the probability of the 
assembly sequence being fair is 1. Therefore the observation is stated only for infinite assembly 
sequences. 

Observation 2.1. Let T = {T,a,T) be a TAS, let p : T ^ (0, oo) be a strictly positive tile 
concentration assignment, and let a be an infinite assembly sequence resulting from the assembly 
of T according to p as described above. Then Pr[a is fair] = 1. 

Proof. Let a = {a^ \ < i < oo), where ao = cr, and consider assembly ctj for some i. Since each tile 
addition increases the size of the frontier by at most 3, \dai\ < 3i + \da\. Define /min = mintgT p{t), 
/max = YjterPi^)^ ^nd /ratio = /min//max- Lct < p < /c be some stage where a-p is not terminal, 
and let rh G dop. It suffices to show that Pr[m is never tiled] = 0. Note that /min < faii'^n) < /max 
for all i. Then we have 

/ ->\ fai (?7i) /min /ratio ^ /ratio 

Paiim) = ^ ' > ^ „ = T^—r > 



E faiin) Yl /max \dai\ 3i + \da\' 
Then by the inequality 1 + x < e^ for all x G M, 

oo oo 

Pr[m is never tiled] = JJ(l-pa.(m)) < ^1^'^"'^"^^ 

i=p i=p 

°° /ratio _r 1 

< JJ^ e 3i+|9(7| _ g Z-ii=p 3i+\da\ ^ 

i=p 

which is equal to since the sum is a divergent general harmonic series. □ 
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3 Constructing a Square using 0(1) Tile Types by Tile Concen- 
tration Programming 



This section is devoted to proving the fo 
For all (5 > and n € N, define = 



lowing theorem, whic 



log 0.9421 

and define b{n, 6) = max [rs, 2'^'^"^'^^] + cs + Skn- 



1 is the main result of this paper. 



cs = 2 + 



log 



log 0.717 



, and kn 



[log raj +1 
3 



Theorem 3.1. For all 6 > 0, there is a tile assembly system Ts = (T,a,2) such that, for all 
integers n > b{n,5), there is a tile concentration assignment pn '■ T ^ [0, oo) such that the set 
{ (a;, y) | x, y € {1, . . . , n} } strictly self-assembles in Ts{pn) with probability at least 1 — 6. 

Note that for any fixed 6 > 0, b{n,5) = 0{p?/^) (where the constant in the 0() depends on 5), 
whence n > 6(ri, 5) for all sufficiently large n. 



3.1 Intuitive Idea of the Construction 

Kao and Schweller introduced a basic primitive in [11] (refining a lower-precision technique described 

in [2]), called a sampling line. The sampling line allows tile concentrations to encode a natural 
number whose binary representation can be probably approximately reproduced. Kao and Schweller 
utilize the sampling line to encode n G N by an approximation n' G N such that (1 — e)n < n' < 
(1 + e)n with probability at least 1 — 6. 

The idea of our construction is as follows. Wc will "approximate" only numbers m small enough 
that the sampling line approximation has sufficient space to be an exact computation of m with 
high probability. The construction of Kao and Schweller can be thought of as estimating n by, in 
a sense, probabilistically counting to n using independent Bernoulli trials with appropriately fixed 
success probability; i.e., the probabilities arc used to estimate an approximate unary encoding of n, 
which is converted to binary by a counter. Representing n in unary, of course, takes space n, and 
recovering it probabilistically from tiles subject to randomization requires using much more than 
space n to overcome the error introduced by randomization. Kao and Schweller use an ingenious 
technique to spread this estimation out into the center of the n x n square being built, affording 
O(n^) space to approximate n closely. However, that construction lacks the space to compute 
n exactly, which requires much more than Bernoulli trials - applying the standard Chernoff 
bound to the Kao-Schweller sampling line achieves an upper bound of 0{n^) trials - to achieve a 
sufficiently small estimation error. Hence, attempting to use a sampling line directly to compute n 
would result in a line containing many more tiles than the tiles that compose an n x n square, 
and no amount of twisting the line will cause it to fit inside the boundaries of the square. 

We split n's binary expansion b{n) = 6162 ■ ■ ■ ^'[lognj+i ^ {^-i 1}* i^ito three subsequences 616467 . . ., 
626563 . . ., and 636569 . . ., each of length about | logn, and interpret these binary strings as natu- 
ral numbers mi, 777,2, 7713 n^^'^ to be estimated. The problem of estimating n is reduced to that 
of estimating these three numbers. At the same time, we introduce a new sampling line tech- 
nique that can exactly estimate a number m with high probability using only 0{m?) trials.^ Since 

''As opposed to the O(m^) trials that would be required by the Kao-Schweller sampling line. It is possible to 
use Kao and Schweller's original sampling line to estimate seven numbers - [lognj -|- 1 (the length of the binary 
expansion of n), and the six numbers mi - me encoded by length- |"il2°^i±ij substrings of n's binary expansion, each 
small enough that mf = o(n) - and to use these numbers to reconstruct n and from that, build an n x n square. A 
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mi, 7722, ma < n^/^, estimating mi,m.2, and ma will require 0(n^/^) trials, which fits within the 
width of an n X n square for sufficiently large n. 

Intuitively, the reason that estimating mi, m2, and ma creates an improvement over estimating 
n directly is that the space needed for the unary encodings of numbers whose binary length is 
one-third that of n's does not scale linearly with that length; the unary encoding of these numbers 
scales with n^^'^, not n/3, whence a quadratic increase in the space needed for probabilistic recovery 
remains sufficiently small (0(n^/^)) that three such decodings easily fit into space n. 

3.2 Probabilistic Decoding of a Natural Number using a Sampling Line 

In this section, we describe how to exactly compute a positive integer m probabilistically from tile 
concentrations that are appropriately programmed to represent m. In our final construction, the 
sampling line will estimate not one but three integers m,i, m2, and ma, as described in Section 3.1, by 
embedding additional bits into the tiles. However, for the sake of clarity, in this section, we describe 
how to estimate a single positive integer m, and then describe in Section 3.2.2 how to modify the 
construction and set the probabilities to allow three numbers to be estimated simultaneously on a 
single sampling line. 
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Figure 1: The portion of the basic Kao-Schweller sampling line that controls its length. Two tiles compete 
nondeterministically to bind to the right of the line, one of which stops the growth, while the other continues, 
giving the length of the line a geometric distribution. 



The basic length-controlling portion of the Kao-Schweller sampling line is shown in Figure 

1.^ A horizontal row of tiles forms to the right of the seed. Two tiles, G ("go") and S ("stop") 
nondeterministically connect to the right end of the line; G continues the growth, while S stops 
the growth. If S has concentration p G [0, 1] and G has concentration 1 — p, then the length L of 
the line is a geometric random variable with expected value 1/p. By setting p appropriately, E[L] 

straightforward and tedious analysis of the constants involved reveals that such a technique can be used to construct 
n X n squares for n > 10^^. Wc achieve much more feasible bounds on n (f» 10^ for S = 0.01) using the techniques 
introduced in this paper, and indeed, better bounds than those required by Kao and Schweller to approximate n, 
whose construction achieves, for instance, a (0.01, 0.01)-approximation only for n > 10^^, according to their analysis. 
^Our description of the Kao-Schweller sampling line is incomplete, as discussed in the next paragraph. 
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can be controlled, but not precisely, since a geometric random variable may have a deviation from 
the expected value that is too large for our purposes. 

Kao and Schweller allow a third tile type to bind within the sampling line, which does the 
actual sampling for computing a natural number, but our construction splits this sampling into a 
separate set of tiles that forms above the line. The sampling portion is discussed in Section 3.2.2. 
For the present time, we restrict our discussion to controlling the length of the line. 



3.2.1 More Precisely Controlling the Sampling Line Length 

Our goal is to control L, the length of the sampling line, such that, by setting tile concentrations 
appropriately, we may ensure that L lies between 2"~^ and 2" with high probability, for an a G Z+ 
of our choosing (which will be influenced by the number n we arc estimating). That is, we may 
ensure that the number of bits required to represent L is computed precisely, even if the exact value 
of L varies widely within the interval [2"~^, 2"). We then attach a counter - a group of tiles that 
measures the length of the line by counting in binary - to the north of the line that measures L 
until the final stopping tile. The stop signal is not intended to stop the counter immediately, but 
rather to signal that the counter should continue until it reaches the next power of 2 - i.e., the next 
time a new most significant bit is required - and then stop. Hence, we may choose an arbitrary 
power of 2 and set tile concentrations to ensure that the counter counts to that value and then 
stops. 



seed' 



S 



concentration concentration concentration concentration 
1-p p 1-p p 



concentration concentration 
1 -p p 




Figure 2: The portion of the sampling line of our construction that controls its length, r stages each 
have expected length 1/p, making the expected total length r/p, but more tightly concentrated about that 
expected length than in the case of one stage. 



To increase the precision with which we control L, we use not one but many stages of "go" 
and "stop" tiles, Gi,Si, G2, S2, ■ ■ ■ , Gr, Sr- The construction is shown in Figure 2. Gi and Si each 
compete to bind to the right of Si-i and Gj. Si signals a transition to the next stage i -|- 1, with Sr 
stopping the growth of the line after r stages. Therefore, the sequence of tiles to the right of the 
seed is a string described by the regular expression GIS1G2S2 . . .G*Sr- Each Si has concentration 
p, and the remaining Gi tiles each have concentration 1 — p. The length L of the line is a negative 
binomial random variable^ with parameters r,p (see [14]) with expected value r/p by linearity of 

®The term negative is misleading; a negative binomial random variable is better described (informally) as the 
inverse of a binomial random variable, if one thinks of a binomial random variable as being like a function that 
maps a number of Bernoulli trials to a number of successes. A negative binomial random variable maps a number of 
successes to the number of trials necessary to achieve that number of successes. 
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expectation; i.e., its length is the number of Bernoulli trials required before exactly r successes, 
provided each Bernoulli trial has success probability p. 

Let N,R €N and p € [0, 1]. A binomial random variable B{N,p) (the number of successes after 
N Bernoulli trials, each having success probability p) is related to a negative binomial random 
variable J\f{R, p) (the number of trials before exactly R successes) by the relationships 

Vv[M{R,p) < iV] = Vt[B{N,p) > R] and (3.1) 
Ft[M{R,p)>N] = Pr[B{N,p) < R\. (3.2) 

Thus, Chcrnoff bounds that provide tail bounds for binomial distributions can be applied to negative 
binomial distributions via (3.1) and (3.2). 

To cause L to fall in the interval [2"~^,2"), we must set its expected length L (by setting 
p = r/L) to be such that the r*^ success occurs when the line has length in the interval [2"~^, 2"). 
Note that pN is the expected number of successes in the first A'^ tiles of the line; i.e., it is the 
expected number of successes in exactly Bernoulli trials. 

We define e and e' so that L = (1 + e)2"~^ = (1 ~ e')2" and the two error probabilities derived 
below are approximately equal; e ~ 0.442695 and e' ~ 0.2786525 suffice. The event that L < 2"'~^ is 
equivalent to the event that 2"^^ Bernoulli trials are conducted (with expected number of successes 
p2"-~^) with at least r successes. By (3.1) and the Chernoff bound [14, Theorem 4.4, part 1], 

Pr[L<2»-i] = Pr [r > (1 + e)p2-i] < [j^] = [ju^) 

= = ((T#T.) < 0.942r. 

The event that L > 2" is equivalent to the event that 2" Bernoulli trials are conducted (with 
expected number of successes pi"") with fewer than r successes. To bound the probability that L 
is too large, we use (3.2) and the Chernoff bound for deviations below the mean [14, Theorem 4.5, 
part 1], 

Pr[L>2«] = Pr[r<(l-e')p2«] < [j^^^) = 

/ X r2V((l+e)2''-l) / X 2r/(l+e) 

By the union bound, 

Pr[L ^ [2"-\ 2")] < 2 • 0.942r (3.3) 

Therefore, by setting r sufficiently large, we can exponentially decrease the probability that L 
falls outside the range [2"~^,2"), independently of a. For example, letting r = 113 leads to 
Pr [L [2"~^, 2*^)] < 0.0025. Since r is a constant depending only on S, it can be encoded into the 
tile types as shown in Figure 2. 

3.2.2 Computing a Number Exactly using a Sampling Line 

As stated previously, our goal is that, with a sampling line of length 0{m?), we can exactly 
compute a number m. The idea is shown in Figure 3, and is inspired by the sampling line of 
Kao and Schweller [11] but can estimate a number more precisely using a given length, as well as 
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m in binary 



most significant k 
'bits 



least significant / 
bits ignored 

- sampling row 



growtii I 



signal to stop at 
next power of 2 



Figure 3: Computing tlie natural number m = 2 from tile concentrations using a sampling line. For brevity, 

glue strengths and labels are not shown. Each cohimn increments the primary counter, represented by the 
bits on the left of each tile, and each gray tile increments the sampling counter, represented by the bits on 
the right of each tile. The number of bits at the end is I + k, where c is a constant coded into the tile set, 
and k depends on m, and I = k + c. The most significant k bits of the sampling counter encode m. In this 
example, k = 2 and c = 1. 



having a length that is itself controlled more precisely by the technique of Section 3.2.1. The length- 
controlling portion of the sampling line of length L will control a counter placed above the sampling 
line, which counts to the next power of 2 greater than L, 2". This counter will eventually end up 
with a total bits before stopping. Let k be the maximum number of bits needed to represent m {k 
will be about | logn in our application), and let I = a — k. We form a row above the row described 
in Section 3.2.1, which does the sampling. To implement the Bernoulli trials that estimate m, one 
of two tiles A (the gray tile in Figure 3) or B (the white tile in Figure 3) nondeterministically binds 
to every position of this row. Set the concentration of A to be ^^'^t?' and the concentration of 

S to be 1 — '"^'^n^' ^ . We embed a second counter - the sampling counter - within the primary 
counter. Whenever A appears, the sampling counter increments, and when B appears it does not 
change. Let M be the random variable representing the final value of the sampling counter. Then 
M is a binomial random variable with E[M] = m2^ + 2^"^ 

We will choose k and I so that the most significant k bits of the sampling counter will almost 
certainly represent m. Intuitively, the least significant / bits of M "absorb" the error. This will occur 
if m2' < M < (m + 1)2'. Note that m < 2^. Let e = Then the Chernoff bound [14, Theorems 
4.4/4.5, part 2] and the union bound tell us that 

Pr M > (m + 1)2' or M < m2^ 
= Pr [M > (1 + £)E[M] or M < (1 - £)E[M]] 



3m + e 



< e" 



V3 



2/2 



Let c G N be a constant. By setting / = + c, the probability of error decreases exponentially 



m c: 



Pr 



M > (m + 1)2' or M < m2' 



< e-2=-V3 + e-2-V2 < 2 . 0.7172 



(3.4) 



For instance, letting c = 6 bounds the left-hand side of (3.4) below 0.0052. 
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The number of samples is 2" = 2^'^"'"'^ = 0{{2^)'^). Since m < 2^, integers m such that m? <C n 
can be "probably exactly computed" using much fewer than n Bernoulli trials, and can therefore 
be computed by a sampling line without exceeding the boundaries of an n x n square. 

3.3 Computing n Exactly 

We have shown how to compute a number m exactly using a sampling line of length 0{rn^) and 
height O(logm). To compute n, the dimensions of the square, we must compute mi,m2, and m3, 
which are the numbers represented by the bits of the binary expansion of n at positions congruent 
to 1 mod 3, 2 mod 3, and mod 3, respectively. To compute all three of these numbers, we embed 
two extra sampling counters into the double counter, in addition to the sampling counter described 
in Section 3.2, to create a quadruple counter. This requires 8 sampling tiles instead of 2, in order 
to represent each of the possible outcomes of conducting three simultaneous Bernoulli trials, each 
trial used for estimating one of mi, m2, or 777.3. 

Given i G {1, 2, 3}, let bi € {0, 1} denote the outcome of the 7*^ of three simultaneous Bernoulli 
trials, and let Piihi) denote the probability we would like to associate with that outcome. As noted 

in Section 3.2.2, the values of the Cj's arc given by Pi{l) = "^'^2°^ — ' ^"^^ Pii^) = 1 ~ Pii^)- 

Since each of the three simultaneous Bernoulli trials is independent, we can calculate the appro- 
priate concentration of the tile representing the three outcomes by multiplying the three outcome 
probabilities together. Then the required concentration of the tile representing outcomes bi,b2,b3 
is given by pi(6i) ■ P2{b2) ■P3{b3)- 

Once the values 7771, 777,2, and 777.3 are computed, we must remove the c least significant (bottom) 
bits from the bottom of the primary counter. Since c is a constant depending only on 6, it can be 
encoded into the tile types. We must then remove the bottom half of the remaining bits.^ At this 
point, the concatenation of the bits on the tiles represent the binary expansion of 77. Rather than 
expand them out to use three times as many tiles, we simply translate each of them to an octal 
digit, giving the octal representation of 77, with one octal digit per tile replacing the three bits per 
tile. Finally, this representation of n is rotated 90 degrees counter-clockwise, used as the initial 
value for a decrementing, upwards-growing, base-8 counter, and used to fill in an 77 x 77 square using 
the standard construction [17]. Rotating 77 to face up starts the counter 2fc-|-2 tiles from the bottom 
of the construction so far. Furthermore, testing whether the counter has counted below requires 
counting once beyond 0, using 2 more rows than the starting value of the counter. Therefore, 
to ensure that exactly an 77, x 77 square is formed, the value 77 — 2^ — 4, rather than n exactly, is 
programmed into the tile concentrations to serve as the start value of the upwards-growing counter. 
An outline of this construction is shown in Figure 4. 

3.4 Choice of Peirameters 

We now derive the settings of various parameters required to achieve a desired success probability 
and derive lower bounds on 77 necessary to allow the space rcqiiircd by the construction. To ensure 
probability of failure at most 6, we pick r, the number of stages of stopping tiles that must attach 

^Isolating the most significant half of the bits can be done using a tile set similar to the algorithm one might use 
to program a single-tape Turing machine to compute the function 0^" 0". 
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Figure 4: High-level overview of the entire construction, not at all to scale. For brevity, glue strengths and 
labels are not shown. The double counter number estimator of Figure 3 is embedded with two additional 
counters to create a quadruple counter estimating mi, TO2, and ms, shown as a box labeled as "Figure 3" 
in the above figure. In this example, mi = 4, m2 = 3, and ma = 15, represented vertically in binary in the 
most significant 4 tiles at the end of the quadruple counter. Concatenating the bits of the tiles results in the 
string 001101011011, the binary representation of 859, which equals n — 2fc — 4 for n = 871, so this example 
builds an 871 x 871 square. (Actually, 871 is too small to work with our construction, so the counter will 
exceed length 871, but we choose small numbers to illustrate the idea more clearly.) Once the counter ends, 
c tiles (c = 3 in this example) arc shifted off the bottom, and the top half of the tiles arc isolated (fc = 4 in 
this example). Each remaining tile represents three bits of n, which are converted into octal digits, rotated 
to face upwards, and then used to initialize a base-8 counter that builds the east wall of the square. Filler 
tiles cover the remaining area of the square. 
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before the primary counter is sent the stop signal, so that 2 • 0.9421'' < | as in (3.3): 



log 0.9421 



For example, choosing r = 113 aehieves probability of error 5/4 (in ensuring the counter stops 
between the numbers 2"~^ and 2") at most 0.0025. 

To ensure that each of mi, m2, and ms are computed exactly, we set c, the number of extra 
bits used in the primary counter beyond 2k, such that e"^'' + e~^° < |, as in (3.4), or more 
simply, such that 2 • 0.717^""^ < |; i.e., set 



2 + 



log 



log- 



log 0.717 



For example, choosing c = 7 achieves probability of error 6/4 (in ensuring that mi is computed 
correctly) at most 0.0025 (in fact, at most 0.000005). 

By the union bound, the length of the sampling line and the values of mi, m2, and ms are 
computed with sufficient precision to compute the exact value of n with probability at least 1 — 6. 
The example values of r and c given above achieve 6 < 0.01. 

The choices of r and c imply a lower bound on the value of n necessary to allow sufficient 
space to carry out the construction. Clearly the counter must reach at least value r, since there 
are r different stopping stages. The more influential factor will be the value c, which doubles the 
space necessary to run the counter each time it is incremented by 1. n requires [lognj + 1 bits 
to represent, but our estimation will be a string of length the next highest multiple of 3 above 
[lognJ + 1. Therefore, each of mi, m2, and ma requires 



k 



[log nj + 1 



bits to represent. Recall that the primary counter will have height 2k + c and count to 2'^^^'^ (so 
long as r < 2^^^'^^). Then, c columns are required to shift off the constant c bits from the least 
significant bits of the counter, and 2k columns are required to shift off the least significant half of 
the bits of the counter to isolate the k most significant bits, k columns are needed to translate 
the groups of three bits into octal and to rotate this string to face upwards for the square-building 
counter. 

Hence, the total length required along the bottom of the square to compute n is max{r, 2^'^+'^} + 
c + 3k. Expanding out the definitions of r, k, and c derived above gives the lower bound b{n, 6) on 
n described in Theorem 3.1. 

For sufficiently large n and small enough 5, r is much smaller than 2'^^'^'^, so the latter term 
dominates. For example, to achieve probability of error 6 < 0.01 requires n > 8,000,000. According 
to preliminary experimental tests, in practice, a smaller value of c is required than the theoretical 
bounds we have derived. For example, if the desired error probability is 5 = 0.01, setting c = 7 
satisfies the analysis given above, but in experimental simulation, c = 3 appears to suffice for 
probability of error at most 0.01, and reduces the space requirements by a factor of 2^~^ = 16. 
In this case, n = 9000 can be computed by a construction that will stay within the 9000 x 9000 
square. 
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A simulated implementation of this tile assembly system using the ISU TAS Tile Assembly 
Simulator [15] is available at http://www.cs.iastate.edu/~lnsa/software.html. The tile set 
uses approximately 4500 + 9c + 4r tile types, where r and c are calculated from 6 as above. 

4 Assembly of Finite Scaled Shapes 

Soloveichik and Winfrec [23] studied the self-assembly of scaled shapes, in particular studying the 
complexity of assembling "scaled-up versions" of finite shapes, as measured by the number of tile 
types needed to uniquely assemble a scaled shape. 

Formally, define a shape to be a connected set S Ql?. For c G Z+, define the c-scaling of S to 
be the set 

S' = { {x,y)eZ^ I {[x/c\,[y/c\)eS}. 

Intuitively, S'^ is S "magnified by factor c" . If we imagine that we would like to assemble S, but 
we compromise on assembling S"^ instead, then c is referred to as the resolution loss. Soloveichik 
and Winfree proved that for every finite shape S, there is a scaling factor c and a TAS T such that 
T uniquely assembles S"^, and the number of tile types in T is within a multiplicative logarithmic 
factor of the Kolmogorov complexity of S, measured as the length in bits of the shortest program 
outputting a list of the coordinates of S. 

Kao and Schweller asked whether there is a constant-sized tile set that, through concentration 
programming, can assemble a scaling of any finite shape with high probability. Wc answer this 
question affirmatively, by combining the construction of [23] with the construction of Section 3.2.2. 

The following is the main theorem of Section 4. 

Theorem 4.1. For all 6 > 0, there is a tile assembly system Ts = (T, a, 2) such that, for all finite 
shapes S G 7?, there exists c G Z+ and a tile concentration assignment ps T [0, oo) such that 
strictly self-assembles in Ts{ps) with probability at least 1 — 5. 

Proof. Given a finite shape S, the construction of [23] uses an intricate construction of a "seed 
block" that "unpacks" from the hard-coded tile types a single-tape Turing machine program vr G 
{0, 1}* that outputs a binary string bin(S') representing a list of the coordinates of S (in fact, a 
shortest program for bin(S') in the sense of Kolmogorov complexity [13]). The construction of [23] 
is intended to utilize an asymptotically optimal number of tile types to achieve this unpacking. 
The width of the seed block is then c, chosen to be large enough to do the unpacking, and also 
large enough to accommodate the simulation of vr by a tile set that simulates single-tape Turing 
machines. Once this seed block is in place, a tile set then assembles the scaled shape by carrying 
bin(iS') through each block, and then using the relative order of the block to determine the next 
block(s) to assemble. The order in which blocks are assembled is determined by a spanning tree of 
S", so that any blocks with an ancestor relationship have a dependency, in that the ancestor must 
be (mostly) assembled before the descendant, whereas blocks without an ancestor relationship can 
potentially assemble in parallel. See [23] for more details. 

In a similar fashion to the technique used by Summers [24] to combine the construction of [23] 
with a temperature programming construction, we replace the seed block tiles of [23] with a tile 
set that produces the program tt from tile concentrations, and utilize the remainder of the tile 
set of [23] unchanged. This is illustrated in Figure 5. For our purpose, we do not require the 
compactness that necessitated the "unpacking" phase of the construction of [23]. Choose c to be 
sufficiently large that tt can be simulated within the trapezoidal region of the c x c block of Figure 
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Figure 5: The seed block used to replace the seed block of [23], from which the construction of [23] can 
assemble a scaled version of the shape S (encoded by a binary string representing the list of coordinates, 
also labeled "5" in the figure), which is output by the single-tape Turing machine program n. it is estimated 
from tile concentrations as in Figure 3, then four copies of it are propagated to each side of the block, where 
it is executed in four rotated, but otherwise identical, computation regions. When completed, four copies 
of the binary representation of S border the seed block, which is sufficient for the construction of [23] to 
assemble a scaled version of S. 
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5, and also sufficiently large that the construction of Section 3.2.2 has sufficient room to estimate 
the binary string vr from tile concentrations in the center region (the "double counter estimator") 
of Figure 5. Once this is done, the construction of [23] can take over and assemble the entire scaled 
shape S"^. The portion of the construction of [23] that achieves this is a constant-size tile set, so 
combined with our construction remains constant. □ 

The bound on c due to the double-counter estimator construction of Section 3.2.2 is O ^2l'^l^^ . 

Hence for any shape S whose shortest program takes super-exponential time (in the length of the 
output, which we may assume is at least |7r| since tt is a shortest program for iS), the resolution 
loss is no larger than that achieved by [23]. Such shapes are precisely those with greater than 
exponential computational depth, in the sense of Bennett [3] . 

5 Tradeoff between Tile Concentration Precision and Number of 
Tile Types 

5.1 Motivation 

We have described how a single tile set in Winfree's abstract tile assembly model, appropriately 
"programmed" by setting tile concentrations, exactly assembles an n x ri square with high prob- 
ability, for any sufficiently large n. This requires specifying tile concentrations to O(logn) bits 
of precision (the constant in the 0() being about 4 in our construction), which is asymptotically 
optimal for most n by a standard information-theoretic lower bound. 

As observed by Chandran, Gopalkrishnan, and Reif [5], it is perhaps physically unrealistic to 
enforce that tile concentrations are maintained to an arbitrary degree of precision.^ They consider 
a more realistic model of randomized self-assembly in which the system is equimolar: all tile 
concentrations arc equal, so that whenever m > 2 tile types compete to bind to the same position 
in a growing assembly, each tile type is sampled with uniform probability ^. They show how to 
construct, for each n G Z"*", a height- 1 line of expected length n using O(logra) tile types in a 
randomized equimolar tile assembly system. They show this to be a tight upper bound for all n 
(not just for algorithmically random n), and they observe that this is superior to the n tile types 
required to uniquely assemble a length-n line in the standard deterministic aTAM. In the standard 
aTAM, Adleman, Cheng, Goel, and Huang [1] similarly specify a number n by assembling an n x n 
square, using an optimal number of tile types (0(logn/loglogn) in the case of squares). 

Intuitively, [1, 5] and the present paper can be thought of as computing (through tile self- 
assembly) a number n using O(logn) bits of "input", with the input specified via two extreme 
approaches: 

present paper: O(logn) bits specified optimally in the tile concentrations, and 0(1) bits specified 
in the tile types 

[1,5]: 0(1) bits specified in the tile concentrations (specifically, bits), and 0(log7T,) bits specified 

optimally in the tile types 

^Though some authors [7, 22] have suggested that it may eventually be possible to control concentrations to the 
highest precision possible, by controlling exact molecular counts of chemicals in solution. 
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Arbitrary tile concentrations may potentially be too permissive a model, yet it may also be the 
case that the requirement of equimolar tile concentrations is overly strict. Suppose that a chemist 
tells us that molecular concentrations can be controlled, but only to g bits of precision for some 
integer g. That is, the chemist can guarantee that if we request a tile concentration of ]? G [0, 1], 
then the actual concentration whenever the tile is sampled will be within of p. We must assume 
that each time the tile is sampled it could potentially be selected with any concentration in the 
range around p; i.e., over the course of assembly the concentration could change according to an 
adversarial scheme as long as the sampled concentration stays within a distance of 2"^ of p. Then 
if we wish to estimate a number n requiring logn bits to describe, we could potentially estimate 
g of these bits though concentration programming,^ and the remaining log(n) — g bits require 



5.2 Formal Definition of the Model 

The construction of Section 3 can be modified to achieve the asymptotic lower bound of Section 
5.1. We formalize the conditions stated in Section 5.1 as follows. Let e > 0. Informally, e represents 
a concentration error, a distance from the desired concentration with which tiles may be sampled. 
That is, when a tile t is sampled, its concentration could lie anywhere in the range [p{t) — e, p(t) + e]. 
The formal effect of e on the semantics of assembly are described below. 

Modeling multiplicative error in the concentration would mean replacing the above interval 
with [p{t){l — e),p{t){l + e)]. Multiplicative error is perhaps a more realistic model than additive 
error. From the perspective of pure mathematical strength, if > 1, then multiplicative error is a 
stronger constraint, and if p{t) < 1, then additive error is a stronger constraint. Since we exclusively 
use concentrations in the interval (0, 1), our results are at least as strong as if we had defined error 
to be multiplicative. However, multiplicative error gives too much power when p{t) <C 1 by allowing 
us to "cheat" and get around the need for a tradeoff between tile concentrations and tile complexity. 
In particular, it is still possible to encode an arbitrary number of bits into the tile concentrations, 
by choosing a> 1 and ensuring that for every pair of "potential concentrations" p, p' (which to use 
depending on which of two values are being encoded), if p{t) > p'{t), then p{t){l — e) > ap'{t){l+e). 
No matter how large e is (as long as it is less than 1), p'{t) can be set sufficiently small to obey this 
inequality and allow us to "tell p(t) and p'{t) apart" even under the error, using a mechanism similar 
to the construction described in Section 6.3. Additive error is required to impose constraints on tile 
concentration programming strong enough to truly limit the number of bits that can be encoded 
into the tile concentrations, so that we are forced to encode some of the bits into the tile types. 
Since no concentration is above 1, this assumption is not providing us with any extra power lacking 
under the multiplicative error model. Therefore the results of this section are at least as strong as 
if we had chosen the multiplicative error model. 

To avoid tedious repetition, recall the variables defined in Section 2. Define 



n 



\^log(log(; 




tile types by the Rothemund/Winfree lower bound [17] 



max{0, pit) — e} 



Pt{t) 



p{t) + e, 




t&Ta^ (m) 



ieTc . (m) 



^Actually, we could potentially estimate some constant multiple of g bits; see the footnote in the proof of Theorem 
5.3 for an explanation. 
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p and e induce the subprobability measure^'^ Pp^^ : AoiT] [0, 1] defined by 

fc-2 

a={ai|0<'!<fc)eA(a) i=0 

For p G [0, 1], e > 0, and X C Z^, we say X strictly self-assembles in T{p) with probability at 
least p subject to concentration error e if Pr p„ . [dom a ^ X] > p. That is, T self-assembles 

into a shape equal to X with probability at least p, even if tile concentrations are maliciously and 
dynamically adjusted by up to an additive difference of e throughout the assembly process so as to 
minimize the probability of assembling X. 

As discussed in Section 2, we will generally ignore the contribution of Pai,e(^) to the probability 
of success, since the correctness of the constructions is not affected by choice of frontier location 
so long as the assembly sequence is fair. Observation 5.1 is an analog of Observation 2.1 using 
Pp,e- It strengthens the hypothesis that p{t) is strictly positive for all tile types t to the hypothesis 
that p{t) — e is strictly positive. This ensures that no frontier location m is starved due simply 
to adversarial choice of concentration of the tile types that can bind to m, implying that our 
assumption of a fair assembly sequence is justified. 

Observation 5.1. Let T = (T, a, r) be a TAS, let e > be a tile concentration error, let p : T ^ 
(e, oo) be a tile concentration assignment assigning concentrations strictly greater than e, and let a 
be an infinite assembly sequence resulting from the assembly ofT according to p and e as described 
above. Then Pr[a is fair] = 1. 

Proof. Similar to the proof of Observation 2.1. □ 

We term Pai(iM) ~ Pai,e(iM) and Paiirn) — pQ..^e(m) the tile selection probability error and 
frontier selection probability error, respectively. The model stated above defines error in specifying 
concentrations, but our proofs require discussing error in tile selection probabilities. The following 
lemma bounds the latter in terms of the former. 

Lemma 5.2. Let T = (T, a, r) be a TAS, let e > be a tile concentration error, let c = 
max |rQ,(m)|, let a G w4[7^, letm G da with fa{rh) > 1, lett G Ta{m), and let p:T ^ [0, oo) 

cieA[T],fh&da 

he a tile concentration assignment. Then Pa{t\m) — pa.eiA"^) ^ (c + l)e. 

Proof. For any a,b,e,§ G M such that e,d > 0, b > a, and 6 > 1, 

a — e a a a € a a6 e 

+ 



b + S b b b + d b + 6 b b{b + 6) b + S 

a bS e a S + € ^ i x \ 

- b~ b{b + s)~ b + S " b~bTs - 



"A subprobability measure on a set X is a function P : X [0,1] such that J2 -f (^) ^ 1- 

xex 
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By this inequality with a = p{t), b = /a(m), and 6 = ce, 

P7it) P7{t) 



Pa,e{t\m) 



ftM) E pt{t) 



E [pit') + e) fa{m) + ce 

t'eTa(m) 

□ 

In particular, the hypothesis fa{rn) > 1 of Lemma 5.2 holds with equality in our main construction. 
5.3 Optimal Counter 

Wc recall the construction of an optimal counter by Adleman, Cheng, Goel, and Huang [1], which 
we will combine with the construction of Section 3.3 to prove the theorem. That paper showed 
how to uniquely assemble an n x n square from a tile set with O (^ logfjgn ) ^^^^ types. Much of [1] 
concerns achieving an optimal running time in addition to an optimal number of tile types, but 
our current goal is simply to build a square while optimizing the number of tile types and bits 
of precision of concentration. Therefore we will use a variant of a simpler construction described 
in [1], rather than their main construction, since we need only to optimize the number of tile types. 

Rothemund and Winfree [17] described how to build an n x n square from 0(1) + logn tile 
types, by using logn tile types to encode n in binary, one bit per tile, in a "seed row" that grows 
immediately from the seed via double bonds. From there a base-2 counter (assembled by a 
constant number of tile types) counts from n down to 0, and forms one side of the square, from 
which another constant-sized set of "filler" tiles forms the rest of the square to be as wide as the 
counter is high, as in Figure 4. 

Each of these tile types is an element of a set of cardinality at least logn, yet each is only 
encoding a single bit, rather than the information-theoretically optimal log logn bits. Adleman, 
Cheng, Goel, and Huang propose choosing an integer b such that 

b> 



log log n ' 

and encoding n in base b in the seed row, using 

logn logn logn logn 

h = l0g;,n = — — < ; r-— = < 2; 



log b log log log ~ log log log n log log n 

unique tile types. One can then use a base-6 counter to imitate the construction of Rothemund and 
Winfree. However, this counter will no longer use a constant number of tile types, since 6 depends 
on n. But, by choosing b a power of two such that 

logn l°f" , (5.1) 



log log n log log n ' 



^^Thc quotes around "seed row"' arc to indicate that the row is not actually the seed, as the problem is trivialized 
if one allows a non-constant seed assembly. What we mean by "seed row" is that the tile set is designed so that the 
first thing to happen during assembly is the attachment via double bonds of tile types hard-coded to bind next to 
the single seed tile. 
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we can guarantee that the counter does not use too many tile types either. Specificahy, the counter 
ahernates "increment" and "test for overflow" rows as in [17]. For the increment row, for each 
d G {0, 1, ... ,6 - 1}, and each s G {MSB, INTERIOR, LSB}, and each c G {0, 1}, there is a tile 
type representing "digit d with significance s and borrow value c", since no matter the size of 
b, the borrow value is at most 1. For the test for overflow, we require one tile type for each 
d G {0, 1, . . . , 6 - 1}, and each s G {MSB, INTERIOR, LSB}. This uses 

b-3-2 + b-3< is /^f" 
log log n 

tile types. Therefore, combining the seed row tile types and the counter tile types, and using the 
same constant number of tile types to fill in a square given a counter of the correct height, we 
require at most 0{l) + 20 ^^^^^^^ tile types to assemble an n x n square. 

5.4 Tradeoff between Tile Concentration Precision and Number of Tile Types 

The following is the main theorem of Section 5. 

Theorem 5.3. For all 6 > 0, there is a constant c such that the following holds. For all 
g,n G Z+, there is a singly-seeded tile assembly system Ts^n,g = (T,a,2) such that \T\ < c + 
iog('iog(ra")-g) + log log n^/^, and there is a tile concentration assignment pn'-T ^ [0, oo) such that 
the set { (x, y) | x, y G {1, . . . , n} } strictly self-assembles in Ts,n,g{pn) with probability at least 1 — S 
subject to concentration error . 

That is, log(n) — g, the number of bits remaining in n after g bits have been estimated from 
concentrations, is asymptotically optimally represented in T, by the lower bound of [17]. 

Proof. We combine the construction of Section 5.3 with that of Section 3.3, and the idea is shown in 
Figure 6. The first non-constant term in the bound on T derives from our modified construction to 
account for concentration error in the "Bernoulli trial" sampling tiles described in Section 3.2.2. The 
second non-constant term (negligible compared to the first) comes from fixing the tiles described in 
Section 3.2.1, and is explained at the end of the proof. First wc deal with the Bernoulli trial tiles. 

Essentially, the same construction of Figure 4 is used, except that the basc-8 counter that formed 
the east wall of the square is now modified to be a more exotic counter that counts to n using two 
different bases for the digits, the base used depending on the relative significance of the digit. 
Intuitively, the least significant digits are in base 8, and are those estimated from concentrations. 
Wc choose these digits to be as numerous as possible, while obeying the constraint that they can be 
precisely estimated even with the concentration error. We call these digits ui (abusing terminology 
to refer to the integer and the string of digits as the same object). Suppose ni is j/3 octal digits 
long (i.e., requires j bits). The remaining portion of n is hard-coded into the tile types, and is used 
to represent the integer n2 = (n — ni)/2^. n2 is represented in base b, where b obeys (5.1) with n2 
substituted for n. 

The two strings ni and n2 are concatenated as in Figure 6, and used as the start value for 
the unusual counter described earlier. This counter uses two sets of tile types. Those on the 
right are almost the same as in Figure 4, decrementing in base 8. The difference is that the most 
significant digit, instead of binding to the "filler" tiles, propagates a borrow to the next column, 
which is the least significant digit of the optimal counter of Section 5.3. Thus, the optimal counter 
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double- 
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Figure 6: The assembly of an n x n square taking some bits of n from concentrations, and the rest from the 
tile types, n's binary expansion is split into ni and n2- n\ is estimated in base 8 as in Figure 4, and n2 is 
hard-coded into the tile types in base b « \og\^g n ' t-'^l' These strings are used to drive a "doublc-basc" 

counter that uses octal digits for the portion representing ni and base-6 digits for the rest, decrementing the 
base-6 portion once for each time the octal counter counts to 0. 



is decremented once every time the base-8 counter (of width j/3) decrements to 0; in other words, 
the double-base counter counts to the value n2 • 8-^/^ + ni = n2 • 2-? + ni = n. 

It remains to describe how many digits should be allocated between n\ and n2 so as to allow 
ni to be estimated with precision even given the concentration error, yet minimize the number of 
tile types needed to encode n2. In other words, we must choose the maximum value of j (which 
minimizes the number of tile types for 712) such that a j-bit number can be estimated with precision 
subject to concentration error . 

First, consider what happens to the construction of Section 3.2.2 if concentration error is intro- 
duced. Recall that to estimate a number m, the probability of a certain tile t is set to ™^ — ) where 
a = O(m^) and k = a — l \s the number of bits needed to represent m. If 2" trials arc conducted, the 
expected value of M, the number of successes, is the midpoint of the interval [m2\ (m-|- 1)2'). Con- 
sider a concentration error of \s-- By Lemma 5.2 and the fact that |Ta(m)| < 8 for all a G A\T\ 
and all fn € da, this implies a tile selection probability error of at most < In this 

case, each time t competes to bind, t's probability of being chosen could have expected value as 

low as 2a s-iid as high as ^3 , rather than the desired ^ . In other words, the 

expected value will drift in the middle two quarters of the range ("^+^^)^ ^ . A straightforward 
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re-analysis of the Chernoff bounds in Section 3.2.2 shows that in this case, setting c = I — k, the 
error probability can be bounded below 2 • 0.717^'' , rather than 2 • 0.717^" as in Section 3.2.2. 
That is, the constant in the exponent goes down by 2 to reflect the loss of precision. To make up 
for this, we must set c to be 2 greater than we would otherwise to ensure that the value M falls in 
the interval [m2\ (m + 1)2') with the same probability. 

Considering what this implies for the number of bits that can be reliably estimated from concen- 
trations subject to the error bound, if the concentration error is 2~^, then we must have 2~f < ^gs" 
for the above argument give us the desired error bound. Since k = a— I, this is the same as requiring 
k < g — Q, where k is the number of bits of m. Recall that we combine the estimation of three such 
integers mi, m2, and 771,3. Putting their bits together gives us at most Z{g — 6) bits that can be 
estimated from concentrations.^^ Let j = 3(y — 6).^^ 

Let q = [log nj + 1 be the number of bits in n. The number n is represented in binary as 
bq-ibq-2 ■ ■ - bo, where each 6j G {0, 1}. Let rii = 6j_i6j_2 ■ ■ - bQ, and let n2 = bg_ibq-2 ■ ■ ■ bj- Then 
the number of bits of 112 is q — j. As argued in Section 5.3, to hard-code these tiles into the tile set 
so that they bind to the left of ni as in Figure 6, it suffices to use a number of tile types at most 

20 = 20 q-9 

log (q - j) log {q - 3{g - 6)) log (g - 5) ' 

so long as g > 9. This achieves the first non-constant term of the bound on |r| stated in the 
theorem. 

While we have ensured that the tiles used to conduct Bernoulli trials as in Figure 3 are robust 
to concentration error, the tiles used in Figure 2 are not so robust. This can be handled in the 

following way, and is responsible for the second non-constant term of the bound on |T|. Observe 
that the length of the sampling line is a power of two. In our initial construction, it was necessary 
to program this length entirely into the concentrations to achieve a constant tile set. However, 
we may now use the fact that this length is succinctly describable to hard-code it into the tile 
types without increasing the tile complexity by too much. To estimate ni, the sampling line must 
grow to length 2" < n^''^ < n^/^. Hence the length 2" can be described using log a < log log n^/^ 
bits, and therefore as many tile types. Practically, this encoding could be achieved by running a 

^^Technically, for this to be true we would have to split the three sets of Bernoulli trials across three different 
pairs of competing tile types, rather than combining them using a "product construction" into a single set of eight 
competing tile types. Such a change to the tile set is easy; for instance, one could run three sampling counters in a 
row, each one propagating through the bits estimated by the previous sampling counters. 

^^Since we imagine that a concentration error is first fixed, and then we let n — >■ 00, we could re-phrase part of 
the statement of the theorem to, "For all g, for all sufficiently large n," where "sufficiently large" now depends on 
g as well as the desired error probability 5. Then we could eliminate the need to break up the estimated number 
into three subsequences, so long as no greater than 1/3 of n's bits arc estimated from concentrations. In this case we 
could choose j = g — 6. But wc leave the split into three subsequences in the construction, to show that the minimum 
value of n needed to work can bo made not to depend on g. However, in our analysis, we drop the coefHcient of 3 to 
make for a simpler theorem statement. 

However, as with the idea to improve the upper bound size of the sampling module to O(n^) by splitting n into 
t subsequences, for t a constant, rather than three, we could potentially learn t ■ g bits from concentrations, rather 
than just g (even with concentration error fixed to 2~*), by creating t different pairs of tile types that each sample g 
bits. The physical basis of this "linear speedup for precision" is that the number of bits learned from concentrations 
is proportional to the number of tile types created. In other words, the experimenter, to double the number of bits 
learnable from concentrations, if existing tile types have already been programmed to their maximum precision of 
concentration, must double the number of tile types in use, and put the same effort into setting the concentrations 
of those new tile types as precisely as the concentration error allows. 
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binary counter to count from a (encoded in the tile types using log a tile types) down to 0, growing 
upwards, and then using the east wall of this counter, of length a, as the first column of a east- 
growing binary counter that will count to 2", and letting the north wall of this second counter serve 
the same purpose as the sampling line of Figure 2. □ 

6 Unrealistic Aspects of the Concentration Programming Model 

This paper solves theoretical problems in a theoretical model and is not intended to be an exper- 
imental blueprint. Nonetheless, this section discusses some difficulties with the thermodynamics 
of physically implementing the concentration programming model, gives arguments that the main 
construction of this paper is robust to these difficulties, and suggests potential fixes that could 
help with implementing this construction or other constructions in the concentration programming 
model. 

6.1 Concentrations Change as Tiles are Used Up 

One obvious observation is that despite the aTAM stipulating an infinite number of copies of each 
tile type, the number of tiles is finite, and the number will decrease as more and more assemblies 
are created. If the tiles are not used up at a rate exactly proportional to their concentration, then 
their concentrations will change. 

This potential problem can be overcome by observing that the tile set of Section 3 can be par- 
titioned into "sampling" tiles, which are intended to compete with each other nondeterministically, 
and "computation" tiles, which are intended to process the results of the sampling. As we argue in 
Section 5, the sampling tiles are robust to small errors in the actual sampling probabilities. In other 
words, the concentrations are allowed to vary by a small amount, and the construction will still 
work. Therefore, we simply set the ratio of sampling tiles to computation tiles to be large, so that 
by the time all of the computation tiles have been used up (hence stopping any more assemblies 
from forming), the concentrations of the sampling tiles cannot have changed by very much, even if 
some were sampled far out of proportion to their concentrations, which is itself unlikely by the law 
of large numbers. 

However, this fix exacerbates the problem discussed in Section 6.2. Fortunately, we argue that 
these problems may be avoided as well. 

6.2 Equal Concentrations Required to Approximate the aTAM with the Kinetic 
Tile Assembly Model 

The regular aTAM - even without concentration programming - is not a completely realistic model 
of what actually happens at the molecular level when DNA tiles interact. It is a useful abstraction, 
but in reality, chemical reactions are reversible, defying the monotone nature of the aTAM, and 
DNA sequences binding with "low energy" (i.e., strength less than the temperature) have enough 
attraction to partially overcome the thermal effects that pull tiles apart, defying the constraint that 
only DNA tiles with sufficiently matched glues will bind. 

^^Note that this difficulty is not unique to tile concentration programming. As explained in Section 6.2, approxi- 
mating the aTAM in the kTAM requires equal tile concentrations, a requirement that is also challenged by the fact 
that concentrations may change as tiles are used up during the assembly process. 
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The kinetic Tile Assembly Model (kTAM), also introduced by Winfree [27], models reality at 
molecular scales more closely than the aTAM, at the cost of sometimes being more difficult to 
analyze. A full description is given in [27]; in this paper we only sketch the relevant intuition. The 
primary differences between the kTAM and aTAM are: 

1. Tiles may detach as well as attach. 

2. Incorrect tiles may attach. 

In fact, the rate at which tiles attach to a frontier location (the forward rate) is assumed 
proportional to their concentration. This would appear to imply that equimolar systems can do no 
computation whatsoever, as glues no longer affect which tiles go where. The key is this: the rate 
at which tiles detach (the reverse rate) is assumed inversely and exponentially proportional to the 
strength with which they bind. Thus, rather than tiles that bind with strength 2 staying attached 
forever, and tiles that bind with strength 1 immediately detaching, in the kTAM, tiles with binding 
strength 2 detach "slowly" and tiles with binding strength 1 detach "quickly" . For simplicity, it is 
assumed that tiles binding with strength detach immediately, so it is as if they never attached. 
In some papers (for example [6]), the locking kTAM model is used, which assumes that tiles bound 
with strength 3 will never detach, on the assumption that the energy required to break a strength 
3 bond is too large to occur due to random thermal fluctuations in any reasonable amount of time. 
Given these assumptions about strength and > 3, the remaining difference with the aTAM is 
that, if an insufficiently attached tile (a tile bound with strength 1) stays on just long enough for 
another tile to bind and secure it in place with strength 2, then the tile will remain bound for a 
long time, despite its presence being a potential error. 

Winfree showed in [27] that despite the possibility of error, the behavior of any tile assembly 
system in the aTAM can be approximated arbitrarily closely by the kTAM. The trick is to slow 
down the assembly process. In the kTAM, there is no explicit temperature parameter, but the 
relative ratio of the forward rate to the various reverse rates plays a similar role. In particular, 
if the forward rate is set to be just barely larger than the reverse rate of strength 2 bound tiles, 
then assembly will drift in a sort of random walk with a small bias towards forward growth, and a 
large bias toward reverse growth of an insufficiently attached tile. Intuitively, the effect of slowing 
down the rate of net forward growth is to make more time for insufficiently attached tiles to detach 
before they can become locked in due to a second attachment that secures them with strength 2. 

Implicit in this approach is that there is a single well-defined forward rate. But the forward 
rate is proportional to tile concentrations, so it is not the same for each tile type unless each tile 
type is equally concentrated. Attempting to vary the relative concentrations will mean either that 
some tile type will have forward rate much higher than the strength-2 reverse rate, hence will be 
more likely to get locked in because it makes so many attempts to bind with strength 1 (or will be 
more likely to lock in some other erroneous attachment), or that some tile type will have forward 
rate much less than the strength-2 reverse rate, hence will have no net forward growth at all. This 
argument would appear to doom the utility of the concentration programming model. 

Nonetheless, we argue that the construction given in Section 3 - as well of the constructions 
of [11] and [2], for the same reasons - is robust to this problems. Hence it may be realistic to 
attempt tile concentration programming with the particular constructions of this paper and [2,11], 
although the model of tile concentration programming, in general, remains flawed due to the above 
argument. 



25 



The first observation is that all sampling tiles attach with a single strength 2 bond on their 
input side, and under the standard kTAM model, strength bonds cannot form. Hence there is no 
possibility of error, in the standard kTAM model, when one of these tiles binds, since there is no 
other input tile that could be present to, through an insufficient strength-1 attachment, temporarily 
hold an crroncoTis tile in place. Of course, given that strength-0 bonds arc unrealistic, since even 
two mismatched glues share some DNA nucleotides in common and will bind with some positive, 
though small, strength, we must account for even the wrong kind of tile being sampled (such as one 
of the counter tiles attaching to a sampling tile location). The construction has the property that 
all sampling tiles bind via a single strength-2 bond on their input side, and have no other positive 
strength glues except at their single output side. Hence, if the wrong kind of tile attaches, it will 
be held with strength significantly less than 2, no matter what other tiles attach. For instance, if 
"strength-0" bonds are really "strcngth-0.1" bonds, then such insufficient attachments will never 
be held to the existing assembly with greater than strength 1.1. Therefore it is reasonable to claim 
that such errors will eventually detach, and likely detach quickly, regardless of what other errors 
collude to increase the likelihood of the erroneous tile remaining. The fix discussed in Section 6.1 
should have similar immunity to this problem, as the only errors with enough strength to potentially 
become locked in are those involving a different sampling tile than the intended binding, but the 
sampling tiles have concentrations that are relatively close to each other, even if they are much 
larger than the concentrations of the computation tiles. 

6.3 Using Concentrations Arbitrcirily Close to Uniform 

To the extent that the construction still may suffer from the errors caused by variable tile concen- 
trations, there is a fix that can be applied to the construction to alleviate this problem, which will 
allow the concentrations to be programmed within an arbitarily small interval around 0.5. The 
construction of Section 3.2.2 uses tiles whose concentrations are set in the following way. First, 
partition the unit interval into N equal-size subintervals. Set the concentration of a tile type t to 
be p, where p is the midpoint of one of these intervals, and set its rival tile to have concentration 
1 — p. Then count the number of type t tiles that occur in repeated sampling, divide by the num- 
ber of samples, and determine the subinterval in which that number lies. If enough samples are 
performed, the sampled probability will almost certainly lie in the same subinterval as p. 

This entire process could be carried out with probabilities that are arbitrarily close to 0.5, at 
the cost of requiring extra precision in the programmed concentrations. Simply repeated the entire 
process described in the previous paragraph, but instead of partitioning the unit interval [0,1], 
partition the interval [0.5 — e, 0.5 + e] for some small e > 0. Using the construction of Figure 3, if 
e = 2~'^~^, then the construction would simply ignore the most significant d bits of the value M 
except the most significant bit (as well as ignoring the least significant / bits, as in Figure 3). By 
this trick, the concentrations of the sampling tile types can be made arbitrarily close to uniform. 

However, the tile types that determine the length of the sampling line are still given concen- 
trations arbitrarily close to or 1 as n — )■ oo. In Figure 2, stage i has expected length ^ if the 
probability of tile type Si is set to p G [0, 1]. If we want the expected length of a stage to be L this 
requires setting p = 1/L, which approaches for large L. This part of the construction may also be 
adjusted to utilize concentrations arbitarily close to uniform by the following fix, which we describe 
only informally but is straightforward to implement similarly to the constructions described earlier 
in this paper. 

Let e > 0, and suppose we require that all concentrations must lie in the interval [0.5 — e, 0.5 -|-e]. 
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Choose the smahest positive integer i such that 2~* < e; we wih set concentrations within the 
interval [0.5 — 2~*,0.5 + 2~*]. Partition the interval [0.5 — 2~*,0.5) into an infinite number of 
sub- intervals 



Si = 


[0.5 


- 2-\0.5 - 2" 




Si+1 = 


[0.5 


-2-'-\0.5- 


2-i-2 


Si+2 = 


[0.5 


-2-^-2,0.5- 


2-i-3 



The idea is to set a tile t's concentration p to be in the midpoint of an interval Sj (and set 
its competing tile t' to have concentration 1 — p), and then, for each k = i,i + l,i + 2, to test 
whether i's concentration is in the interval S^, stopping at the first k for which this test reports 
"yes". As we argue later, the very act of conducting these tests will grow a line of appropriate 
length to conduct the sampling of Section 3.2.2. 

If wc maintain a count of the total number of appearances of t that appear in a "large" (defined 
below) power of 2 number of Bernoulli trials, then the frequency with which t appears is within 
the interval Sj if and only if the most significant j + 1 bits of the counter that counts the number 
of appearances of t is the string OP^^O. Hence the k^^ test is "run a 'large' power of 2 number of 
Bernoulli trials by letting t and t' compete, count the number of occurrences of t, and report 'yes' 
if the most significant k + 1 bits of the counter are the string Ol'^~^0". 

More precisely, recall that the analysis of Section 3.2.2 showed that if we fix a positive integer k, 
conduct Uk = 2'^^''= Bernoulli trials with success probability p set to be in the midpoint of a dyadic 
interval, where 1^ = k + c and c is a constant, then the probability that the k most significant bits 
of the binary expansion of the number of successes is not equal to the k most significant bits of the 
binary expansion of p is at most 2 • 0.717^'' . We use this fact, but now we define the constant c 
to depend on k and define it as = /c + c', for c' a constant (which will be chosen based on our 
desired probability 5 of failure of the entire construction). Assuming that the interval p occupies is 
Sj, we want to bound the probability that one of the tests inaccurately identifies S^ as the interval, 
for k ^ j. By the above argument, this probability is at most 



00 



k=i k=l 



2^0.717^''= < 2^0.717' 

k=l 

= 2 -0.7172''"' ^0.717- 



00 

c'-2 \ ^ , _2fc 



< 2-0.717 



k=l 

,2c'-2 



Hence by choosing c' sufficiently large we can make the probability of error sufficiently small to 
ensure that all tests report the correct answer. It is straightforward to hard-code the constant c' 

into a tile set as in Figure 4, as is maintaining the number (which begins at the value c' + i 
and is incremented once for each test). By implementing this tile set to grow below the x-axis, 
the northern boundary of the assembly produced may serve the same function as the northern 
boundary of the tiles of Figure 2. 

It remains to show that the horizontal length to which this structure grows may be controlled 
to the same precision as the bottom line of Figure 3. Recall that this line could be programmed 
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to grow to a length in the interval [2°~^,2"), for arbitrary a G Z+. The above construction can 
be programmed to grow to test an arbitrary number of intervals^^. By the argument above, the 
fc''^ interval test requires rik = 2*^+'*= = 2^'^"'"'^'= = 2^^^'^ Bernoulli trials. Hence the total number of 
trials to test intervals i through j is 



23k+c' _ ^ 2^*^ 

k=i k=i 

Since i and c' are constants that depend only on the desired closeness to uniformity of the concen- 
trations and error probability, respectively, we can hard-code small lengths into the tile types and 
state that for each j € Z+ , the sampling line can be made to have length 

k=l 

with high probability. This implies that we may control the length of the sampling line to within 
a multiplicative factor of 8. By choosing d = 2^^ a power of 8 such that 2" < d < 8 • 2" for the 
value a in Section 3.2.2, the sampling line will have sufficient length to achieve the error probability 
bounds of Section 3.2.2, yet will still have length bounded by 0(n^/^), hence will fit inside the nxn 
square. 

Thus all concentration-programmed tile types, both the tile types of Section 3.2.2 doing the 
Bernoulli trial sampling and the tile types of Section 3.2.1 determining the number of samples, 
can be assigned concentrations arbitrarily close to uniform, hence allowing an arbitrarily close 
approximation of Winfree's approximation. 



7 Conclusion 

7.1 Potential Improvements 

The focus of the present paper is on conceptual clarity. We have therefore described the simplest 
(i.e., easiest to understand, but not necessarily smallest) version of the tile assembly system that 
achieves the desired asymptotic result that an n x n square assembles with high probability for 
sufficiently large n. We now observe that this theoretical result could be improved in practice by 
complicating the tile set. 

Our implementation of the tile set uses approximately 4500-1- 9c -|-4r tile types, where, for exam- 
ple, r = 113 and c = 7 are sufficient to achieve error probability S < 0.01. The tiles are so numerous 
because of the need to simultaneously represent 4 bits in a tile, in addition to information such as 
the significance of the bit (MSB, LSB, or interior bit), and doing computation such as addition, 
which requires tiles that can handle the 2^ possible input bit -|- carry signals. Putting together a 
few such modules of tile sets results in thousands of tiles before too long. The number of tile types 
could be reduced by splitting the estimation of mi, m2, and into three distinct geometrical 
regions, so that each tile is required to remember less information. This would complicate the tile 
set, as it would require more shifting tricks to ensure sufficient room for all counters, and would 
require bringing the bits back together again at the end, but it would likely reduce the number of 
tile types. 

Specifically, if we wish to quit after testing precisely b intervals, set p to lie in the midpoint of the interval Si+b-i 
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A large value of n is required to achieve a probability of success at least (1 — 5) for reasonably 
small 5; n > 8 • 10^ is required to estimate n with 99% chance of correctness (in theory, but 
apparently not in practice, as discussed in Section 3.4). This shortcoming can be compensated in 
a number of ways. 

In a similar spirit to the linear speedup theorem, more than three simultaneous Bernoulli trials 
may be conducted with each sampling tile. For example, conducting 6 Bernoulli trials with each 
sampling tile would estimate two bits of mi, . . . ,1713 with per sampling tile, rather than one bit, 
halving the required length of the sampling line. This would result in a prohibitively large tile 
set, however; as the number of tile types increases exponentially with the number of simultaneous 
Bernoulli trials per tile type. 

A conceptually simpler and practically more feasible improvement is to use 0/1-valued tile 
concentrations to simulate tile type programming (i.e., designing tile types specially to build a 
particular size square, as in [17]) for small values of n, by including tile types that deterministically 
construct an n x n square for each small n, setting concentrations of those tiles to be 1 and setting 
concentrations of all other tiles to be 0. Many of the same square-building tile types for can be reused 
for different values of n (see [17]), with the different values of n largely being dependent on ^ logn 
hard-coded tiles that immediately attach to the seed. For singly-seeded tile systems, 31ogiV-|-0(l) 
tiles are required to handle all n < N: for each i € {1, . . . ,log A^} that represents the position 
of a bit of n, three tile types are required, one representing "0 at position i", one representing 
"1 at position z" (each of which has double-strength bonds on two sides), and one representing 
"end of string at position i" (with a double-strength bond on one side and a zero-strength bond 
on the other) . Though this solution lacks the "feel" of tile concentration programming, it is likely 
that real-life implementations of tile concentration programming will need to use such hard-coding 
tricks for smaller structures that lack the space to carry out the amount of sampling required to 
reconstruct precise inputs solely from tile concentrations. 

An alternate improvement to the tile set would be to combine the present technique with the 
Kao-Schweller technique of building a sampling line inside of a square, to more efficiently use the 

space available to carry out the estimation. However, square-building is not necessarily the only 
application of this technique, as shown in Section 4. 

The primary novel contribution of this paper is a tile set that, through appropriate tile con- 
centration programming, forms a thin structure of length 0(n^/^) and height O(logra),^^ whose 
rightmost tiles encode the value of n in binary. This binary string could be used to assemble useful 
structures other than squares, such as rectangles and other supersets of the sampling structure 
that are "easily encoded" in a binary string of length O(logn). For the task of building a square, 
this construction wastes the « n? space available above the thin rectangle, but for computing 
other structures, it may be advantageous that the rectangle is kept thin. For instance, biochemists 
routinely use filters (e.g., Millipore Ultrafiltration Membranes) and porous resins [4] to separate 
proteins based on size, in order to isolate one particular protein for study. The ability to precisely 
control the size of the filter holes or resin beads would allow for more targeted filtering of proteins 
than is possible at the present time. DNA is likely too reactive with amino acids to be used as the 
substrate for such a structure, so an implementation of the tile assembly model not based on DNA 

^^By partitioning n's binary representation into t rather than three subsequences, for t € N a constant, the number 

of trials needed to estimate n is 0(ri'^^\ However, the constant factors in the 0() increase, making the technique 
even less feasible for small values of n. But if some application requires an asymptotically very short line, the line 
can be made length O(n^) for any e > using this technique. 
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would be required for such a technique. 

Similarly, polyacrylamide gel electrophoresis [21], another technique for discriminating biological 
molecules on the basis of size, requires molecular mass size markers, which are control molecules 
of known molecular mass, in order to compare against the molecule of interest on the gel. At the 
present time, some naturally-occurring molecules of known mass are Tiscd, but their masses are 
not controllable, and the ability to quickly and easily assemble molecules of precisely a desired 
target mass would be useful in experiments requiring mass markers that differ from the standards. 
Again, DNA is a special case in which this idea is unnecessary, since precise standards have been 
developed for DNA gels (e.g., Novagen DNA Markers). But the tile assembly model may one day 
be implemented using substances that are appropriate for a protein gel. 

7.2 Open Questions 

The proof of Theorem 3.1 shows that for every 6,e > 0, a tile set exists such that, for every n G N, 
appropriately programming the tile concentrations results in the self-assembly of a structure of size 
O(ra^) X O(logn) whose rightmost tiles represent the value n with probability at least 1 — 6. Is this 
optimal? 

Formally, say that a tile assembly system T = (T, a, 2) is 5 -concentration programmable (for 
5 > 0) if there is a (total) computable function r : >ln[7^ — )■ N (the representation function) 
such that, for each n G N, there is a tile concentration assignment p : T — )■ [0, oo) such that 
Pr [r(T(p)) = n] > 1 — 5. In other words, T, programmed with concentrations p, almost certainly 
self-assembles a structure that "represents" n, according to the representation function r, and such 
a p can be found to create a high-probability representation of any natural number. In the case of 
the construction of Section 3, r{T{p)) outputs the integer represented in binary on the right side 
of the structure of Figure 3 (and fits into space 0{v?/^) if the tile types described in Figure 4 are 
not included in T). 

Question 7.1. Is the following statement true? For each 5 > 0, there is a tile assemhly system 
T and a representation function r : ^ofT^ N such that T is 5 -concentration programmable and, 
for each e > and all but finitely many n G N, Pr[|dom T{p)\ < n'^\ > 1 — 5. If so, what is the 
smallest bound that can be written in place of ? 

In other words, we are asking if the O(n^) upper bound on the size of the self-assembled structure 
representing n that is obtained in the proof of Theorem 3.1 is optimal. That structure has size 
0(n^/^), and for each e > 0, the construction could be modified to have size O(n^). Is there a single 
construction whose size is at most for all e > 0, for sufficiently large n? ^l{logn) is a clear lower 
bound on the size of the structure, as it requires logn space to represent most integers n, but it 
would be interesting to find a larger lower bound than J7(logn), or a smaller upper bound than 
0{n'). 

The next question is less formal. Section 6 discusses unrealistic aspects of the concentration 
programming model, and goes into the detail of the construction of Section 3 to explain why 
that particular construction does not suffer from the problems associated with these unrealistic 
assumptions. However, a good model of reality is one that requires no excessively unrealistic 
assumptions, in which conclusions reached within the model can be inferred to apply to reality 
without having to inspect the detail of the argument leading to the conclusion. 

Question 7.2. Is there a model of concentration programming that "automatically avoids" the 
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problems discussed in Section 6, hut which retains (some of) the power of the constructions of [2], 
[11], and the present paper? 

A better model of concentration programming would free future tile concentration programming 
constructions from requiring the equivalent of Section 6 of the current paper. 
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